Search CORE

5 research outputs found

Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

Author: Giles M.
Mudalige G.R.
Reguly István Zoltán
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The key common bottleneck in most stencil codes is data movement, and prior research has shown that improving data locality through optimisations that schedule across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through compilers because there are many options, execution paths and data per grid point, many dependent on run-time parameters, and the code is distributed across different compilation units. In this paper, we adapt the data locality improving optimisation called iteration space slicing for use in large OPS applications both in shared-memory and distributed-memory systems, relying on run-time analysis and delayed execution. We evaluate our approach on a number of applications, observing speedups of 2

\times

on the Cloverleaf 2D/3D proxy application, which contain 83/141 loops respectively,

3.5\times

on the linear solver TeaLeaf, and

1.7\times

on the compressible Navier-Stokes solver OpenSBLI. We demonstrate strong and weak scalability up to 4608 cores of CINECA's Marconi supercomputer. We also evaluate our algorithms on Intel's Knights Landing, demonstrating maintained throughput as the problem size grows beyond 16GB, and we do scaling studies up to 8704 cores. The approach is generally applicable to any stencil DSL that provides per loop data access information

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Oxford University Research Archive

Repository of the Academy's Library

Designing OP2 for GPU architectures

Author: Asouti
B. Spencer
Burgess
C. Bertolli
Corrigan
Corrigan
DeVito
G.R. Mudalige
Giles
Giles
Giles
Howes
I. Reguly
M.B. Giles
Moinier
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Performance prediction and procurement in practice : assessing the suitability of commodity cluster components for wavefront codes

Author: A. Vadgama
A.B. Mills
Adve
Alexandrov
Brewer
Culler
Frank
G.R. Mudalige
Gabriel
Gropp
Hammond
Hammond
Hoisie
I. Miller
J. Holt
J.A. Davis
J.A. Herdman
J.A. Smith
Johnson
Karp
Kerbyson
Kerbyson
Kerbyson
Kerbyson
Mudalige
Mudalige
Mudalige
Nudd
Reinhardt
S.A. Jarvis
S.D. Hammond
Schmuck
Sundaram-Stukel
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2009
Field of study

The cost of state-of-the-art supercomputing resources makes each individual purchase a length and expensive process. Often each candidate architecture will need to be benchmarked using a variety of tools to assess likely performance. However, benchmarking alone only provides a limited insight into the suitability of each architecture for key codes and will give potentially misleading results when assessing their scalability. In this study the authors present a case study of the application of recently developed performance models of the Chimaera benchmarking code written by the United Kingdom Atomic Weapons Establishment (AWE), with a view to analysing how the code will perform and scale on a medium sized, commodity-based InfiniBand cluster. The models are validated and demonstrate a greater than 90% accuracy for an existing InfiniBand machine; the models are then used as the basis for predicting code performance on a variety of alternative hardware configurations which include changes in the underlying network, the use of faster processors and the use of a higher core density per processor. The results demonstrate the compute-bound nature of Chimaera and its sensitivity to network latency at increased processor counts. By using these insights the authors are able to discuss potential strategies which may be employed during the procurement of future mid-range clusters for wavefront-rich workloads

CiteSeerX

Crossref

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

Performance modelling of magnetohydrodynamics codes

Author: A. Alexandrov
A.F. Rodrigues
C.L. Janssen
D. Culler
D. Kerbyson
D. Sundaram-Stukel
G.R. Mudalige
M. Griebel
M.B. Giles
S. Ryoo
S.J. Pennycook
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Performance modelling is an important tool utilised by the High Performance Computing industry to accurately predict the run-time of science applications on a variety of different architectures. Performance models aid in procurement decisions and help to highlight areas for possible code optimisations. This paper presents a performance model for a magnetohydrodynamics physics application, Lare. We demonstrate that this model is capable of accurately predicting the run-time of Lare across multiple platforms with an accuracy of 90% (for both strong and weak scaled problems). We then utilise this model to evaluate the performance of future optimisations. The model is generated using SST/macro, the machine level component of the Structural Simulation Toolkit (SST) from Sandia National Laboratories, and is validated on both a commodity cluster located at the University of Warwick and a large scale capability resource located at Lawrence Livermore National Laboratory

CiteSeerX

Crossref

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

White Rose Research Online